AITopics | minp ts

Collaborating Authors

minp ts

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dynamic data summarization for hierarchical spatial clustering

Abduaziz, Kayumov, Kim, Min Sik, Shin, Ji Sun

arXiv.org Artificial IntelligenceNov-26-2024

Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) finds meaningful patterns in spatial data by considering density and spatial proximity. As the clustering algorithm is inherently designed for static applications, so have recent studies focused on accelerating the algorithm for static applications using approximate or parallel methods. However, much less attention has been given to dynamic environments, where even a single point insertion or deletion can require recomputing the clustering hierarchy from scratch due to the need of maintaining the minimum spanning tree (MST) over a complete graph. This paper addresses the challenge of enhancing the clustering algorithm for dynamic data. We present an exact algorithm that maintains density information and updates the clustering hierarchy of HDBSCAN during point insertions and deletions. Considering the hardness of adapting the exact algorithm to dynamic data involving modern workloads, we propose an online-offline framework. The online component efficiently summarizes dynamic data using a tree structure, called Bubble-tree, while the offline step performs the static clustering. Experimental results demonstrate that the data summarization adapts well to fully dynamic environments, providing compression quality on par with existing techniques while significantly improving runtime performance of the clustering algorithm in dynamic data workloads.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.07789

Country:

North America > Canada > Alberta (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Bi-Filtration and Stability of TDA Mapper for Point Cloud Data

Bungula, Wako, Darcy, Isabel

arXiv.org Machine LearningSep-25-2024

Carlsson, Singh and Memoli's TDA mapper takes a point cloud dataset and outputs a graph that depends on several parameter choices. Dey, Memoli, and Wang developed Multiscale Mapper for abstract topological spaces so that parameter choices can be analyzed via persistent homology. However, when applied to actual data, one does not always obtain filtrations of mapper graphs. DBSCAN, one of the most common clustering algorithms used in the TDA mapper software, has two parameters, \textbf{$\epsilon$} and \textbf{MinPts}. If \textbf{MinPts = 1} then DBSCAN is equivalent to single linkage clustering with cutting height \textbf{$\epsilon$}. We show that if DBSCAN clustering is used with \textbf{MinPts $>$ 2}, a filtration of mapper graphs may not exist except in the absence of free-border points; but such filtrations exist if DBSCAN clustering is used with \textbf{MinPts = 1} or \textbf{2} as the cover size increases, \textbf{$\epsilon$} increases, and/or \textbf{MinPts} decreases. However, the 1-dimensional filtration is unstable. If one adds noise to a data set so that each data point has been perturbed by a distance at most \textbf{$\delta$}, the persistent homology of the mapper graph of the perturbed data set can be significantly different from that of the original data set. We show that we can obtain stability by increasing both the cover size and \textbf{$\epsilon$} at the same time. In particular, we show that the bi-filtrations of the homology groups with respect to cover size and $\epsilon$ between these two datasets are \textbf{2$\delta$}-interleaved.

cluster cover, filtration, minp ts, (13 more...)

arXiv.org Machine Learning

2409.1736

Country:

North America > United States > Wisconsin > La Crosse County > La Crosse (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Iowa (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

IPD:An Incremental Prototype based DBSCAN for large-scale data with cluster representatives

Saha, Jayasree, Mukherjee, Jayanta

arXiv.org Artificial IntelligenceOct-10-2023

DBSCAN is a fundamental density-based clustering technique that identifies any arbitrary shape of the clusters. However, it becomes infeasible while handling big data. On the other hand, centroid-based clustering is important for detecting patterns in a dataset since unprocessed data points can be labeled to their nearest centroid. However, it can not detect non-spherical clusters. For a large data, it is not feasible to store and compute labels of every samples. These can be done as and when the information is required. The purpose can be accomplished when clustering act as a tool to identify cluster representatives and query is served by assigning cluster labels of nearest representative. In this paper, we propose an Incremental Prototype-based DBSCAN (IPD) algorithm which is designed to identify arbitrary-shaped clusters for large-scale data. Additionally, it chooses a set of representatives for each cluster.

algorithm, dataset, prototype, (17 more...)

arXiv.org Artificial Intelligence

2202.0787

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
Asia > India > West Bengal > Kharagpur (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

AMD-DBSCAN: An Adaptive Multi-density DBSCAN for datasets of extremely variable density

Wang, Ziqing, Ye, Zhirong, Du, Yuyang, Mao, Yi, Liu, Yanying, Wu, Ziling, Wang, Jun

arXiv.org Artificial IntelligenceOct-9-2023

DBSCAN has been widely used in density-based clustering algorithms. However, with the increasing demand for Multi-density clustering, previous traditional DSBCAN can not have good clustering results on Multi-density datasets. In order to address this problem, an adaptive Multi-density DBSCAN algorithm (AMD-DBSCAN) is proposed in this paper. An improved parameter adaptation method is proposed in AMD-DBSCAN to search for multiple parameter pairs (i.e., Eps and MinPts), which are the key parameters to determine the clustering results and performance, therefore allowing the model to be applied to Multi-density datasets. Moreover, only one hyperparameter is required for AMD-DBSCAN to avoid the complicated repetitive initialization operations. Furthermore, the variance of the number of neighbors (VNN) is proposed to measure the difference in density between each cluster. The experimental results show that our AMD-DBSCAN reduces execution time by an average of 75% due to lower algorithm complexity compared with the traditional adaptive algorithm. In addition, AMD-DBSCAN improves accuracy by 24.7% on average over the state-of-the-art design on Multi-density datasets of extremely variable density, while having no performance loss in Single-density scenarios. Our code and datasets are available at https://github.com/AlexandreWANG915/AMD-DBSCAN.

algorithm, dataset, dbscan, (15 more...)

arXiv.org Artificial Intelligence

2210.08162

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Guangdong Province > Zhuhai (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Detecting Point Outliers Using Prune-based Outlier Factor (PLOF)

Babaei, Kasra, Chen, ZhiYuan, Maul, Tomas

arXiv.org Machine LearningNov-5-2019

Outlier detection (also known as anomaly detection or deviation detection) is a process of detecting data points in which their patterns deviate significantly from others. It is common to have outliers in industry applications, which could be generated by different causes such as human error, fraudulent activities, or system failure. Recently, density-based methods have shown promising results, particularly among which Local Outlier Factor (LOF) is arguably dominating. However, one of the major drawbacks of LOF is that it is computationally expensive. Motivated by the mentioned problem, this research presents a novel pruning-based procedure in which the execution time of LOF is reduced while the performance is maintained. A novel Prune-based Local Outlier Factor (PLOF) approach is proposed, in which prior to employing LOF, outlierness of each data instance is measured. Next, based on a threshold, data instances that require further investigation are separated and LOF score is only computed for these points. Extensive experiments have been conducted and results are promising. Comparison experiments with the original LOF and two state-of-the-art variants of LOF have shown that PLOF produces higher accuracy and precision while reducing execution time.

data mining, detection, machine learning, (20 more...)

arXiv.org Machine Learning

1911.01654

Country:

Asia > Singapore (0.05)
Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.04)
Asia > Malaysia (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

DIR-ST$^2$: Delineation of Imprecise Regions Using Spatio--Temporal--Textual Information

Tran, Cong, Shin, Won-Yong, Choi, Sang-Il

arXiv.org Machine LearningJun-9-2018

An imprecise region is referred to as a geographical area without a clearly-defined boundary in the literature. Previous clustering-based approaches exploit spatial information to find such regions. However, the prior studies suffer from the following two problems: the subjectivity in selecting clustering parameters and the inclusion of a large portion of the undesirable region (i.e., a large number of noise points). To overcome these problems, we present DIR-ST$^2$, a novel framework for delineating an imprecise region by iteratively performing density-based clustering, namely DBSCAN, along with not only spatio--textual information but also temporal information on social media. Specifically, we aim at finding a proper radius of a circle used in the iterative DBSCAN process by gradually reducing the radius for each iteration in which the temporal information acquired from all resulting clusters are leveraged. Then, we propose an efficient and automated algorithm delineating the imprecise region via hierarchical clustering. Experiment results show that by virtue of the significant noise reduction in the region, our DIR-ST$^2$ method outperforms the state-of-the-art approach employing one-class support vector machine in terms of the $\mathcal{F}_1$ score from comparison with precisely-defined regions regarded as a ground truth, and returns apparently better delineation of imprecise regions. The computational complexity of DIR-ST$^2$ is also analytically and numerically shown.

artificial intelligence, machine learning, social media, (18 more...)

arXiv.org Machine Learning

1806.03482

Country:

Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(6 more...)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Novel Density-Based Clustering Algorithms for Uncertain Data

Zhang, Xianchao (Dalian University of Technology) | Liu, Han (Dalian University of Technology) | Zhang, Xiaotong (Dalian University of Technology) | Liu, Xinyue (Dalian University of Technology)

AAAI ConferencesJul-14-2014

Density-based techniques seem promising for handling datauncertainty in uncertain data clustering. Nevertheless, someissues have not been addressed well in existing algorithms. Inthis paper, we firstly propose a novel density-based uncertaindata clustering algorithm, which improves upon existing algorithmsfrom the following two aspects: (1) it employs anexact method to compute the probability that the distance betweentwo uncertain objects is less than or equal to a boundaryvalue, instead of the sampling-based method in previouswork; (2) it introduces new definitions of core object probabilityand direct reachability probability, thus reducing thecomplexity and avoiding sampling. We then further improvethe algorithm by using a novel assignment strategy to ensurethat every object will be assigned to the most appropriatecluster. Experimental results show the superiority of our proposedalgorithms over existing ones.

algorithm, minp ts, probability, (15 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Asia > China > Liaoning Province > Dalian (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback